max node
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > France > Hauts-de-France > Pas-de-Calais (0.04)
Message-Passing for Approximate MAP Inference with Latent Variables
We consider a general inference setting for discrete probabilistic graphical models where we seek maximum a posteriori (MAP) estimates for a subset of the random variables (max nodes), marginalizing over the rest (sum nodes). We present a hybrid message-passing algorithm to accomplish this. The hybrid algorithm passes a mix of sum and max messages depending on the type of source node (sum or max). We derive our algorithm by showing that it falls out as the solution of a particular relaxation of a variational framework. We further show that the Expectation Maximization algorithm can be seen as an approximation to our algorithm. Experimental results on synthetic and real-world datasets, against several baselines, demonstrate the efficacy of our proposed algorithm.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Utah (0.04)
- North America > United States > Maryland (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.35)
Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning Jean-Bastien Grill Michal Valko Rémi Munos SequeL team, INRIA Lille - Nord Europe, France Google DeepMind, UK
You are a robot and you live in a Markov decision process (MDP) with a finite or an infinite number of transitions from state-action to next states. You got brains and so you plan before you act. Luckily, your roboparents equipped you with a generative model to do some Monte-Carlo planning. The world is waiting for you and you have no time to waste. You want your planning to be efficient.
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > France > Hauts-de-France > Pas-de-Calais (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)
Optimizing $\alpha\mu$
Cazenave, Tristan, Legras, Swann, Ventos, Véronique
$\alpha\mu$ is a search algorithm which repairs two defaults of Perfect Information Monte Carlo search: strategy fusion and non locality. In this paper we optimize $\alpha\mu$ for the game of Bridge, avoiding useless computations. The proposed optimizations are general and apply to other imperfect information turn-based games. We define multiple optimizations involving Pareto fronts, and show that these optimizations speed up the search. Some of these optimizations are cuts that stop the search at a node, while others keep track of which possible worlds have become redundant, avoiding unnecessary, costly evaluations. We also measure the benefits of parallelizing the double dummy searches at the leaves of the $\alpha\mu$ search tree.
Recurrent Sum-Product-Max Networks for Decision Making in Perfectly-Observed Environments
Tatavarti, Hari Teja, Doshi, Prashant, Hayes, Layton
Recent investigations into sum-product-max networks (SPMN) that generalize sum-product networks (SPN) offer a data-driven alternative for decision making, which has predominantly relied on handcrafted models. SPMNs computationally represent a probabilistic decision-making problem whose solution scales linearly in the size of the network. However, SPMNs are not well suited for sequential decision making over multiple time steps. In this paper, we present recurrent SPMNs (RSPMN) that learn from and model decision-making data over time. RSPMNs utilize a template network that is unfolded as needed depending on the length of the data sequence. This is significant as RSPMNs not only inherit the benefits of SPMNs in being data driven and mostly tractable, they are also well suited for sequential problems. We establish conditions on the template network, which guarantee that the resulting SPMN is valid, and present a structure learning algorithm to learn a sound template network. We demonstrate that the RSPMNs learned on a testbed of sequential decision-making data sets generate MEUs and policies that are close to the optimal on perfectly-observed domains. They easily improve on a recent batch-constrained reinforcement learning method, which is important because RSPMNs offer a new model-based approach to offline reinforcement learning.
- North America > United States > Georgia > Clarke County > Athens (0.14)
- Europe > Denmark > North Jutland > Aalborg (0.04)
- Workflow (0.69)
- Research Report (0.50)
The {\alpha}{\mu} Search Algorithm for the Game of Bridge
Cazenave, Tristan, Ventos, Véronique
{\alpha}{\mu} is an anytime heuristic search algorithm for incomplete information games that assumes perfect information for the opponents. {\alpha}{\mu} addresses the strategy fusion and non-locality problems encountered by Perfect Information Monte Carlo sampling. In this paper {\alpha}{\mu} is applied to the game of Bridge.
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
Minimax or Maximin? – Becoming Human: Artificial Intelligence Magazine
Minimax, as the name suggest, is a method in decision theory for minimizing the maximum loss. Alternatively, it can be thought of as maximizing the minimum gain, which is also know as Maximin. It all started from a two player zero-sum game theory, covering both the cases where players take alternate moves and those where they made simultaneous moves. It has also been extended to more complex games and to general decision making in the presence of uncertainty. In the above explanation, it has been mentioned that the minimax algorithms started off with the concept of zero-sum.
Monte-Carlo Tree Search by Best Arm Identification
Kaufmann, Emilie, Koolen, Wouter
We consider two-player zero-sum turn-based interactions, in which the sequence of possible successive moves is represented by a maximin game tree T. This tree models the possible actions sequences by a collection of MAX nodes, that correspond to states in the game in which player A should take action, MIN nodes, for states in the game in which player B should take action, and leaves which specify the payoff for player A. The goal is to determine the best action at the root for player A. For deterministic payoffs this search problem is primarily algorithmic, with several powerful pruning strategies available [20]. We look at problems with stochastic payoffs, which in addition present a major statistical challenge. Sequential identification questions in game trees with stochastic payoffs arise naturally as robust versions of bandit problems. They are also a core component of Monte Carlo tree search (MCTS) approaches for solving intractably large deterministic tree search problems, where an entire sub-tree is represented by a stochastic leaf in which randomized play-out and/or evaluations are performed [4]. A play-out consists in finishing the game with some simple, typically random, policy and observing the outcome for player A. For example, MCTS is used within the AlphaGo system [21], and the evaluation of a leaf position combines supervised learning and (smart) play-outs. While MCTS algorithms for Go have now reached expert human level, such algorithms remain very costly, in that many (expensive) leaf evaluations or play-outs are necessary to output the next action to be taken by the player. In this paper, we focus on the sample complexity of Monte-Carlo Tree Search methods, about which very little is known. For this purpose, we work under a simplified model for MCTS already studied by [22], and that generalizes the depth-two framework of [10].
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning
Grill, Jean-Bastien, Valko, Michal, Munos, Remi
We study the sampling-based planning problem in Markov decision processes (MDPs) that we can access only through a generative model, usually referred to as Monte-Carlo planning. Our objective is to return a good estimate of the optimal value function at any state while minimizing the number of calls to the generative model, i.e. the sample complexity. We propose a new algorithm, TrailBlazer, able to handle MDPs with a finite or an infinite number of transitions from state-action to next states. TrailBlazer is an adaptive algorithm that exploits possible structures of the MDP by exploring only a subset of states reachable by following near-optimal policies. We provide bounds on its sample complexity that depend on a measure of the quantity of near-optimal states. The algorithm behavior can be considered as an extension of Monte-Carlo sampling (for estimating an expectation) to problems that alternate maximization (over actions) and expectation (over next states). Finally, another appealing feature of TrailBlazer is that it is simple to implement and computationally efficient.
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > France > Hauts-de-France > Pas-de-Calais (0.04)
e-Valuate: A Two-player Game on Arithmetic Expressions -- An Update
Aravamuthan, Sarang, Ganguly, Biswajit
e-Valuate is a game on arithmetic expressions. The players have contrasting roles of maximizing and minimizing the given expression. The maximizer proposes values and the minimizer substitutes them for variables of his choice. When the expression is fully instantiated, its value is compared with a certain minimax value that would result if the players played to their optimal strategies. The winner is declared based on this comparison. We use a game tree to represent the state of the game and show how the minimax value can be computed efficiently using backward induction and alpha-beta pruning. The efficacy of alpha-beta pruning depends on the order in which the nodes are evaluated. Further improvements can be obtained by using transposition tables to prevent reevaluation of the same nodes. We propose a heuristic for node ordering. We show how the use of the heuristic and transposition tables lead to improved performance by comparing the number of nodes pruned by each method. We describe some domain-specific variants of this game. The first is a graph theoretic formulation wherein two players share a set of elements of a graph by coloring a related set with each player looking to maximize his share. The set being shared could be either the set of vertices, edges or faces (for a planar graph). An application of this is the sharing of regions enclosed by a planar graph where each player's aim is to maximize the area of his share. Another variant is a tiling game where the players alternately place dominoes on a $8 \times 8$ checkerboard to construct a maximal partial tiling. We show that the size of the tiling $x$ satisfies $22 \le x \le 32$ by proving that any maximal partial tiling requires at least $22$ dominoes.
- North America > United States > New Jersey (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > India > Tamil Nadu > Chennai (0.04)